Semi-Supervised Answer Extraction from Discussion Forums

نویسندگان

  • Rose Catherine
  • Rashmi Gangadharaiah
  • Karthik Visweswariah
  • Dinesh Raghu
چکیده

Mining online discussions to extract answers is an important research problem. Methods proposed in the past used supervised classifiers trained on labeled data. But, collecting training data for each target forum is labor intensive and time consuming, thus limiting their deployment. A recent approach had proposed to extract answers in an unsupervised manner, by taking cues from their repetitions. This assumption however, does not hold true in many cases. In this paper, we propose two semi-supervised methods for extracting answers from discussions, which utilize the large amount of unlabeled data available, alongside a very small training set to obtain improved accuracies. We show that it is possible to boost the performance by introducing a related, but parallel task of identifying acknowledgments to the answers. The accuracy achieved by our approaches surpass the baselines by a wide margin, as shown by our experiments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identifying Products in Online Cybercrime Marketplaces: A Dataset for Fine-grained Domain Adaptation

One weakness of machine-learned NLP models is that they typically perform poorly on out-of-domain data. In this work, we study the task of identifying products being bought and sold in online cybercrime forums, which exhibits particularly challenging cross-domain effects. We formulate a task that represents a hybrid of slot-filling information extraction and named entity recognition and annotat...

متن کامل

Semi-supervised and Unsupervised Methods for Categorizing Posts in Web Discussion Forums

Semi-supervised and unsupervised methods for categorizing posts in web discussion forums Krish Perumal Master of Science Graduate Department of Computer Science University of Toronto 2016 Web discussion forums are used by millions of people worldwide to share information belonging to a variety of domains such as automotive vehicles, pets, sports, etc. They typically contain posts that fall into...

متن کامل

Semi-supervised and unsupervised categorization of posts in Web discussion forums using part-of-speech information and minimal features

Web discussion forums typically contain posts that fall into different categories such as question, solution, feedback, spam, etc. Automatic identification of these categories can aid information retrieval that is tailored for specific user requirements. Previously, a number of supervised methods have attempted to solve this problem; however, these depend on the availability of abundant trainin...

متن کامل

Reusing Discussion Forums as Learning Resources in Wbt Systems

Discussion forums are highly popular and widely applied tools in Web-based training (WBT) systems. Usually, discussion forums are places where learners discuss topics related to courseware, their current learning task, or the learning project they are working on. These discussion forums contain tremendous educational potential for future learners, since they contain question and answer dialogue...

متن کامل

L'analyse de l'émotion dans les forums de santé (Analysis of Emotion in Health Fora) [in French]

Analysis of Emotion in Health Fora Studies about emotion in fora are numerous in Linguistics and Psychology. This contribution approaches this subject from an Information and Communication Sciences point of view, and studies emotion as a criteron of pertinence for patients in a health forum. This paper introduces the empirical step of automatic language processing in order to answer this questi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013